Handling conversation history requires the client to manage state by storing all prior user and assistant messages and sending the full array with every subsequent API request because LLMs are stateless.
One of the most critical concepts in building conversational AI is that LLMs are inherently stateless . They have no memory of previous interactions. When you call the API, it processes only the messages you provide in that single request and then "forgets" everything once the response is generated. This means that to maintain a coherent, multi-turn conversation, the responsibility of managing history falls entirely on the developer's application .
The standard practice is to store the conversation history on the client or server (e.g., in a database or session store). Each time you need the model to generate a new response, you must retrieve the entire relevant conversation history, append the latest user message, and send this full, augmented array of messages in the API request. This includes system instructions, the complete back-and-forth of the conversation, and any previous assistant responses . For long conversations, this can become inefficient, leading to the use of techniques like summarization (compaction) or context injection to manage token usage without losing important context .